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Abstract 

We  present  an  algorithm  for  the  parallel  solution  of  tridiagonal  and  penta- 
diagonal  linear  systems  having  nonzero  elements  at  the  top  right  and  bottom 
left  corners.  Tridiagonal  systems  of  this  kind  arise  from  the  solution  of  two 
point  boundary  value  problems  with  periodic  boundary  conditions.  Pentadiag- 
onal  systems  of  this  kind  arise  from  e.g  the  approximation  of  the  shallow  water 
equations  by  the  two-stage  Galerkin  method  combined  with  a  high  accuracy 
compact  approximation  to  the  first  derivative  (Navon,  1983). 


1*  Introduction 

In  this  paper,  we  develop  an  algorithm  for  the  parallel  solution  of  tridiagonal 
and  pentadiagonal  linear  systems  having  nonzero  elements  at  the  top  right  and 
bottom  left  corners.  This  is  a  generalization  of  an  algorithm  due  to  Kowalik  et 
al  (1984)  for  tridiagonal  systems.  Such  tridiagonal  systems  arise  when  approxi¬ 
mating  a  class  of  two-point  boundary  value  problems  having  periodic  boundary 
conditions: 

=  0<<<i,  (1) 

j/(0)=y(i),  (2) 

y'(0)=y'(i).  (3) 

It  was  shown  by  Katti  (1995)  that  this  problem  leads  to  the  system 

Ay  =  d  (4) 

where  the  matrix  A  is  of  the  form 


Oi 

Cl 

0 

0 

b2 

C2 

0 

0 

bs 

Cl3 

C3 

0 

0 

0 

bN-1 

ayv-i 

9i 

0 

0 

bN 

and  the  right  hand  side  d  is 


Pentadiagonal  as  well  as  tridiagonal  systems  appear  when  the  two-stage 
Galerkin  method  combined  with  a  high  accuracy  compact  approximation  to  the 
first  derivative  is  used  for  the  approximation  of  the  shallow  water  equations  with 
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periodic  boundary  conditions  (Navon,  1983).  In  this  case,  one  has  to  solve  a 
pentadiagonal  system  of  the  form 


Bx  =  d  (7) 

where  the  matrix  B  is  given  by 
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In  the  next  section  we  describe  the  parallel  algorithm  for  the  direct  solution  of 
the  tridiagonal  system.  The  algorithm  for  the  pentadiagonal  system  is  described 
in  section  3.  Numerical  experiments  with  both  algorithms  are  reported  in  section 
4.  The  two  programs  are  attached  as  appendices. 

2.  Algorithm  for  Tridiagonal 

In  this  section,  we  generalize  the  algorithm  developed  by  Kowalik  et  al  [1] 
for  tridiagonal  systems  to  the  case  where  the  matrix  A  is  given  by  (5).  We 
follow  Kowalik  in  our  description.  Divide  the  N  equations  equally  among  the 
TT  processors  (some  may  have  1  more  equation  than  others).  Let’s  assume  for 
simplicity  that  each  processor  gets  k  equations.  The  first  step  is  to  eliminate  bj 
which  are  the  elements  below  the  diagonal.  Each  processor  1  <  i  eliminates 
bj  for  {i  —  l)k  +  2  <  j  <  ik. 

For  j  =  (i  —  1)A:  +  2, . . . ,  fAr 
f{i-i)k+i  ^  only  for  i  ^  1 

Compute  the  multiplier  mj  =  and  update 
f.  ^  f.  -  only  for  /  ^  1 

dj  ^ —  CLj  —  TTljCj  —  X 

Pj  ^  Pj  ^  only  for  i  =  1 

dj  dj  —  rrijdj-i 

This  process  creates  the  new  elements  fj  and  pj^  as  can  be  seen  in  (9)  for 
the  case  tt  =  3. 
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(l\  C\ 

0  02  C2 


0 


0 

0 


0 

0 


Pi 

P2 


The  second  step  is  the  elimination  of  cj  which  are  the  elements  above  the  diag¬ 
onal.  Each  processor  1  <  2  <  tt  eliminates  Cj  for  ik  —  2  <  j  <  {i  -  1)A:  -f  1. 

For  j  =  z7:  2, . . . ,  {i  —  l)Ar  -h  1 

3ik  —  \  ^  ^ik  —  \ 

Compute  the  multiplier  ruj  =  and  update 

/i  ^  fj  -  ^  ^  ^ 

9j  ^  9j  ~  '^j9j+i 

Pj  <r-  pj  —  rrijPj+i  only  for  z  =  1 

dj  i  dj  TTljdj^i 

Note  that  for  each  processor  z,  the  elements  Cik  remain  and  can  only  be 
eliminated  by  passing  the  first  row  of  processor  z  -h  1 .  Upon  completing  these 
steps,  the  system  becomes 
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/ai  0  •••  9i  0 

0  02  0  ;  0 


0  0  .  Pi  \ 

0  0  .  P2  \ 


dk-l  9k-l 


0 

0 
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0 

0 

gk 

0 

0 

0  ••• 

f fe+1 

n/c+1 

0 

9k+l 

0 

0 

0 

0 

f2k 

0 

0 

••  a2k-l  92k-l 

0  a2k 

0  •••  • 

a2k 

0 

0  ••• 

0 

0 

f2k+l 

0‘2k-{-i  0 

92k+l 

(10) 


r 

\9i 


0  •••  00 

0  •••  00 


/sfc-i  0  •••  g^k-\  I 

fzk  0  .  dzk  ) 


If  we  now  take  the  first  equation  from  processor  1  and  the  last  equation 
from  each  processor,  we  end  up  with  a  similar  tridiagonal  system  with  only 
TT  +  1  equations: 


(ai  9i  0  Pi  \ 

fk  cik  gk  0  I 

0  /2fc  (l2k  g2k  I 

qi  0  fsk  (^zk  / 


(11) 


where  fk  is  created  upon  elimination  of  pk  using  the  first  row.  In  general,  the 
matrix  is 

/ai  gi  0 

fk  o.k  gk  0 

0  f2k  <^2k  g2k 

0  .  /(7r-l)fc 

\  gi  0  ■ • •  0 

This  system  can  be  solved  either  serially  or  in  parallel  and  is  referred  hence¬ 
forth  as  the  reduced  system.  The  solution  of  this  system  is  then  broadcasted 
to  all  the  processors.  Upon  receiving  the  solution  of  the  reduced  system,  each 
processor  i  solves  for  the  unknowns  X(^i_i)k+i,  *  *  • ,  Xik^i  in  parallel  by  a  straight¬ 
forward  backward  substitution.  Note  that  for  processor  1,  the  term  x\  need  not 
be  solved  for  since  it  is  already  known  by  virtue  of  the  solution  of  the  reduced 
system. 


Pi  \ 

0 


a(7r-l)/: 

firk 


g{n-i)k 

Ctirk 


(12) 
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3.  Algorithm  for  Pentadiagonal 

In  this  section,  we  develop  an  algorithm  for  the  pentadiagonal  system  (7), 
There  are  a  few  differences  between  this  algorithm  and  the  previous  one.  First, 
the  reduced  system  contains  the  first  t^  equations  from  the  first  processor  and 
the  last  two  equations  from  each  processor.  Thus  the  reduced  system  is  of  order 
2(7r-|-  1).  Second,  each  processor  i  requires  two  equations  from  processor  i  +  1 
to  eliminate  certain  entries  above  the  diagonal  {cj  and  bj). 

We  now  give  the  details  of  the  algorithm,  sissuming  again  tt  =  3  processors 
each  containing  k  equations.  The  first  step  is  to  eliminate  the  entries  below  the 
diagonal,  namely  Sj  and  rj .  Each  processor  1  <  z  <  tt  performs  this  task  on  the 
equations  (z  —  1)A:  4-  2  <  j  <  z^  1. 

For  j  =  (z  —  l)k  +  2, . . . ,  z/:  —  1 
/l(*-i);c+i  ^  s(i_i)fc+i  only  for  i  ^  1 

/2(i^i);:+i  r(^i^i)k^i  only  for  z  /  1 

/2(i._i);c+2  ^  Hi-i)k+2  only^for  i  7^  1 

Compute  the  multiplier  mj  =  and  update 
/Ij  ^  fh  -  only  for  z  ^  1 

f2j  i-  f2j  -  mjf2j-i  only  for  z  7^  1 

Q)J  ^  Clj  ZTZj  Cj  _  j 
Cj  <r-  Cj  —  rrijbj^i 

Pj  ^  pj  *"  TUjPj-i  only  for  z  =  1 
^3  ^  ^3  "■  only  for  z  =  1 

dj  i —  dj  —  TTijdj^\ 

Now  compute  the  other  multiplier  rij  =  update 

/Ij+i  f\j+i  -  only  for  i  #  1 

f2j+i  <-  /2j+i  -  njf2j-i  only  for  *  1 

rj+i  <-  rj+i  -  njCj-i 

Clj  +  l  f-  Clj+l  ~  Tljbj-i 

Pj+I  <-  Pj+i  -  UjPj-i  only  for  i  =  1 

qj+i  9j+i  -  only  for  i  =  1 

dj+i  ^  ~  Tijdj-i 

Note  that  for  each  processor  i,  the  elements  r,fc  remain  and  are  never  elim¬ 
inated.  Because  the  semi-bandwidth  of  the  matrix  has  been  increased  by  one, 
then  so  does  the  number  of  equations  that  need  to  be  passed  between  processors. 
The  matrix  B  becomes 
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/ai 
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qi 
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0 
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_ _ 
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flfe+i 
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/hfe-i 

/22fc-l 
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Cl2k-1 

C2k-1 

b2k-l 

0 

0 

0 

flik 

/22fc 

0 

r2k 

°2fe _ 

_ £2k _ 

b2k 

0 

0 

0 

0 

0 

f^2k  +  l 

/22^  +  1 

a2/:  +  l- 

C2fc  +  1 

^^2fc+l 

Wl 

0 

0 

0 

/Isfe-i 

0 

C3fc-1 

\  Di 

V2 

0 

0 

fhk 

/23A: 

0 

(^Sk 

The  second  step  is  to  eliminate  the  entries  above  the  diagonal,  namely  Cj 
and  bj.  Each  processor  1  <  «  <  tt  performs  this  task  on  the  equations  ik  —  ^  < 

3  <  («  “•  1)^  +  1- 

For  j  ^  ik  —  ,  {i  -1)^+1 

gliks  ^  ^ik-3 

glik^2 

g2ik-2  ^  bik-2 

Compute  the  multiplier  m j  =  and  update 

/Ij  <r-  flj  -  rrijflj+i  only  for  i  ^  1 

f2j  f2j  -  mjf2j^i  only  for  i  ^  1 

glj  ^glj 

g2j  <-  g2j  -  mjg2j^i 

p.  4r-  pj  ^  rrijpj+i  only  for  i  =  1 

Qj  4-  qj  -  rrijqj+i  only  for  i  =  I 

dj  ^ —  dj  —  TTij  dj^i 

Now  compute  the  other  multiplier  rtj  —  and  update  iff  j  >  (z  —  l)Ar  -}- 1 
f\j-i  ^  flj^i  -  nj/lj+i  only  for  i  /  1 

f2j^i  ^  /2j_i  -  nj/2j+i  only  for  i  ^  1 

glj^i  <r-  g\j-i  —  Tijglj^i 

g2j^\  g2j^i  —  njg2j^i 

Pj^i  <r-  Pj^i  -  TijPj^i  only  for  i  =  1 

qj^i  ^  qj^i  -  rijqj^i  only  for  i  =  1 

dj^i  4-  dj-i  —  njdj^i 

Note  that  this  process  does  not  eliminate  the  elements  bik,  and  Cik  for 

each  processor  i.  Thus  we  have  the  following  system 
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0  ••• 
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0  . 

0 

0 

0 

•  •  •  fhk 

fhk 

0 
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b2k  0 

0 

0 

0 

0 

0 

fhk+l 
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02k+l 

0  ^l2fc+l 

g‘^2k+l 

l>3k-l 

0 

0 

0 

CO 

1 

/23fc-l 

0 

dSk-l 

Czk-l 

\  C3k 

bsk 

0  ••• 

0 

0 

fhk 

/23fe 

0 

rsk 

03k 

To  eliminate  bik-i, 

bik  and  Cjk 

,  the  t*'* 

processor  requires  the  first  two  rows 

of  processor  i  +  1.  The  elimination  creates  four  new  elements  gUk^i,  g^ik^i^ 

glik,  and  g2ik. 

By  eliminating  p2  using  row  1  and  Pk-ij  Qk- 

.1,  Pfc,  and  qk  using 

rows  1  and  2  we  then  obtain  the  following  reduced  system 

Oi 

Cl 
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0 

Pi 

qx  \ 
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gh 
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0 

92 
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/l2Jfe-l 

/22A:^1 

02k-l 

C2k-1 

gl2k-l 

g22k-l 

0 

0 

/I2/C 

722;: 
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f^sk-l 
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\  Csk 

b3k 

0 

0 
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f23k 
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Note  that  the  elimination  of  p2  only  generates  one  new  element  r2,  while  the 
elimination  of  the  other  p’s  and  g’s  creates  the  four  new  entries  /1/c-i, 

/U,  and  /2/f .  Note  that  an  additional  term  Ci  has  appeared  but  this  term  is  zero 
and  is  only  included  for  illustrating  the  general  structure  of  the  reduced  system. 
This  matrix  is  a  block  tridiagonal  system  with  nonzero  blocks  at  the  top  right 
and  bottom  left  corners,  i.e.  a  block  form  of  (11).  This  system  can  be  solved 
serially  by  the  block  version  of  the  solver  used  for  (11).  The  solution  is  then 
broadcasted  to  all  the  processors.  Upon  receiving  this  solution,  each  processor 
i  solves  for  the  unknowns  X(,-.i);;+2) '  ’  ‘5  ^ik-2  in  parallel  by  a  straightforward 
backward  substitution.  Note  that  for  processor  1,  the  terms  xi  and  X2  need 
not  be  solved  for  since  they  are  already  known  by  virtue  of  the  solution  of  the 
reduced  system. 


8 


4,  Numerical  Experiments 

In  this  section,  we  present  the  numerical  experiments  used  to  test  the  effi¬ 
ciency  of  the  tridiagonal  and  pentadiagonal  algorithms.  For  both  algorithms, 
the  number  of  processors  tt  and  equations  per  processor  k  are  varied  but  the 
total  number  of  equations  remains  fixed  at  AT  =  10,  000.  All  runs  are  performed 
on  a  cluster  of  Sun4  workstations  running  PVM. 

For  the  tridiagonal  algorithm,  the  matrix  shown  in  equation  (5)  is  defined 
as 

j)^  n:  —  1  ^  2 ,  Cj  — -  1 

and  d,-  =  0  for  z  =  1, . . A  with  the  additional  conditions  that  di  =  —N  and 
djv  =  N,  The  corner  terms  in  equation  (5)  are  defined  as  pi  =  bi  and  qi  =  c^- 
This  system  then  has  the  exact  solution  Xi  =  i  for  i  =  1, . .  .iV.  Such  a  matrix 
system  arises  from  a  finite  difference  or  finite  element  centered  discretization  of 
the  one-dimensional  Laplacian  operator  with  periodic  boundary  conditions. 

Table  1  shows  the  timing  results  obtained  for  the  tridiagonal  algorithm  using 
averages  based  on  five  consecutive  runs.  Results  are  tabulated  for  tt  =  2,4,8 
and  10  processors. 


Number 

of 

Processors 

Number 
of  Equations 
Per  Processor 

Parallel 

Time 

(seconds) 

Serial 

Time 

(seconds) 

Speedup 

Efficiency 

(%) 

2 

5000 

0.10 

0.11 

1.1 

55.0 

4 

2500 

0.22 

0.11 

0.5 

12.5 

8 

1250 

0.22 

0.11 

0.5 

6.3 

10 

1000 

0.12 

0.11 

0.9 

9.0 

Table  1:  Timings  for  the  parallel  and  serial  versions  of  the  tri diagonal  algo¬ 
rithm.  Speedup  is  defined  as  the  time  ratio  between  the  serial  and  parallel 
algorithm.  Efficiency  is  the  ratio  of  speedup  to  the  number  of  processors. 

The  serial  tridiagonal  algorithm  used  for  the  purpose  of  computing  speedup 
rates  is  a  partitioning  algorithm.  The  linear  system  (3)  to  be  solved  can  be 
written  as 

Ax  =  b 

which  can  then  be  partitioned  as  such 

A{x^  +  XnX^)  =b-  Ai^nXn 

and  this  system  can  be  decomposed  into  the  two  corresponding  systems 

Ax^  =  b 
Ax"^  —  • 
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These  two  systems  can  now  be  solved  via  an  LU  decomposition.  This  algorithm 
is  the  periodic  variant  of  the  Thomas  algorithm  which  is  a  very  fast  tridiagonal 
solver.  Table  1  shows  that  the  parallel  version  is  competitive  with  the  serial 
algorithm  especially  for  tt  =  2  and  ;r  =  10  processors.  For  n  =  i  and  tt  =  S 
processors  the  parallel  algorithm  is  deficient  relative  to  its  serial  counterpart. 
Nonetheless,  these  results  demonstrate  that  no  significant  gains  are  made  by 
parallelizing  the  tridiagonal  solver. 

For  the  pentadiagonal  algorithm,  the  matrix  shown  in  equation  (8)  is  defined 

ctS 

Sj  ^  1 ,  Vi  —  1 ,  Uj  —  4,  C(  —  1 ,  bi  1 

and  d,-  =  0  for  f  =  1, . . . ,  AT  with  the  additional  conditions  that  di  =  -2N , 
d2  =  —N,  div-i  —  AT  and  The  corner  terms  in  equation  (5)  are 

defined  as  pi  —  5i,  =  ri,  ^2  =  ^2)  =  ^iv-i?  and  t;2  =  6jv-  This 

system  then  has  the  exact  solution  or,-  =  z  for  2  =  1, ...  AT.  Such  a  matrix  system 
arises  from  the  finite  difference  centered  discretization  of  the  two-dimensional 
Laplacian  operator  with  periodic  boundary  conditions. 

Table  2  shows  the  timing  results  obtained  for  the  pentadiagonal  algorithm 
using  averages  based  on  five  consecutive  runs. 


Number 

of 

Processors 

Number 
of  Equations 
Per  Processor 

Parallel 

Time 

(seconds) 

Serial 

Time 

(seconds) 

Speedup 

Efficiency 

(%) 

2 

5000 

0.11 

0.22 

2.0 

100.0 

4 

2500 

0.22 

0.22 

1.0 

25.0 

8 

1250 

0.22 

0.22 

1.0 

12.5 

10 

1000 

0.13 

0.22 

1.7 

17.0 

Table  2:  Timings  for  the  parallel  and  serial  versions  of  the  pentadiagonal 
algorithm.  Speedup  is  defined  as  the  time  ratio  between  the  serial  and  parallel 
algorithm.  Efficiency  is  the  ratio  of  speedup  to  the  number  of  processors. 

The  serial  pentadiagonal  algorithm  used  for  the  purpose  of  computing  speedup 
rates  is  the  pentadiagonal  version  of  the  tridiagonal  partitioning  algorithm  pre¬ 
sented  above.  In  this  version  the  linear  matrix  system  (8)  can  be  written  as 

Bx  =  b 


and  partitioned  as 

B{x^  +  =b  -  Bi^n--l^n-l  “  Bi^n^n 

and  this  system  can  be  decomposed  into  the  three  corresponding  systems 

Bx^  =  b 
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Bx^  = 

Bx^  = 

These  three  systems  can  now  be  solved  via  an  LU  decomposition.  Table  2  shows 
that  the  parallel  version  does  provide  a  significant  gain  over  the  serial  algorithm 
for  all  of  the  number  of  processors  studied.  Once  again,  the  major  gains  are 
achieved  for  tt  =  2  and  tt  =  10  processors.  In  addition  it  is  worth  considering 
that  the  times  for  the  parallel  tridiagonal  and  pentadiagonal  algorithms  are 
almost  identical.  The  communication  is  extremely  detrimental  to  the  overall 
performance  of  the  tridiagonal  algorithm,  while  for  the  pentadiagonal  it  begins 
to  pay  dividends.  This  study  hints  at  the  fact  that  as  the  bandwidth  of  the 
matrix  is  increased,  the  better  the  possible  performance  of  this  parallel  algorithm 
versus  its  serial  counterpart.  The  algorithm  described  is  generalizable  to  any 
banded  system.  For  a  system  with  semi-bandwidth  /?  ,  then  /?  equations  need 
to  be  passed  between  processors  and  the  reduced  system  is  of  order  (3{ir  + 
1).  Thus  as  the  semi-bandwidth  becomes  sufficiently  large,  the  majority  of 
the  communication  time  is  spent  on  the  actual  transmission  of  information  as 
opposed  to  the  overhead  incurred  in  communication  calls. 

5.  Conclusions 

An  algorithm  for  tridiagonal  and  pentadiagonal  matrices  with  nonzero  ele¬ 
ments  at  the  top  right  and  bottom  left  corners  is  presented.  The  algorithm  is 
generalizable  to  higher  banded  systems  but  numerical  studies  are  only  performed 
for  the  tridiagonal  and  pentadiagonal  cases.  The  numerical  studies  show  that 
the  communication  overhead  severely  hurts  the  performance  of  the  tridiagonal 
case.  However,  the  results  also  show  that  gains  are  made  in  the  pentadiagonal 
case  and  this  trend  points  at  the  possibility  of  further  gains  when  the  band¬ 
width  of  the  matrix  is  increased  beyond  the  pentadiagonal  case.  The  algorithm 
becomes  a  bit  more  cumbersome  to  implement  but  it  does  extend  rather  well  to 
higher  bandwidths.  The  number  of  equations  that  are  required  to  be  passed  and 
the  order  of  the  reduced  system  increase  at  the  rates  /?  and  /?(7r-f  1),  respectively, 
where  (3  is  the  semi-bandwidth  of  the  matrix. 
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Appendix  A 


In  this  appendix  we  give  the  tridiagonal  solver.  The  first  code  is  the  parent 
(master)  followed  by  the  child  (slave).  The  tridiagonal  solver  for  the  reduced 
system  is  also  given. 

^ - * 

♦This  is  the  PARENT  of  a  program  which  solves  the 

♦tridiagonal  system  A  x  =  b  with  periodic  boundary  conditions  using  a 

♦Tridiagonal  Decomposition  Method 

♦as* by  B.  Neta,  C.P.  Katti  and  F.X.  Giraldo. 

♦Programmed  by  F.X.  Giraldo  on  7/95 

* - * 

program  parents 
include  ^fpvmS.h^ 
include  *param.h* 

! Global  Arrays 

dimension  a(imax),  b(imax) ,  c(imax),  d(imax),  x(imax) 
dimension  a_p(nk),  b_p(nk) ,  c_p(nk),  d_p(nk),  xx(nk) 
dimension  fhat (nprocs+1) ,  ahat (nprocs+1) 
dimension  ghat (nprocs+1) ,  dhat (nprocs+1) 
integer  tids(16) 
real  taray(2) 

character  arch^S,  nodename^lO 
npoin=nprocs+nk 

do  i=l,npoin  ! Construct  Linear  System 

b(i)=-l 
a(i)=4 
c(i)=-l 

d(i)=4  +  2^(i-2) 
end  do 

d(l)=-npoin  +  2 
d(npoin)=3^npoin 

! Start  PVM 

call  pvmfmytid(  mytid  ) 
call  pvmf parent (  iptid  ) 
nodename=  *  childS . exe  * 
arch=  * ♦ * 

!  Spawn  Processors  eind  distribute 
!the  Linear  System  among  processors 

do  i=l,nprocs 

call  pvmf spawn (nodename, pvmdefault , arch, l,tids(i) ,ierr) 
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write(*,*("  Spawning  Processor  tids  ", i2, lx, ilO) Oi,tids(i) 
if  (ierr.ne.l)  then 

write(* ,  H"ierr  =  ",i3)Oierr 

write(*,  ^  ("Error!  Could  not  spawn  process  #  ",i3)Oi 
call  pvmfexit(ierr) 
stop 
endif 
do  j=l,nk 

b_p(j)=b(  (i-l)*nk  +  j  ) 

a_p(j)=a(  (i-l)*nk  +  j  ) 

c_p(j)=c(  (i-l)*nk  +  j  ) 

d.p(j)=d(  (i-l)*nk  +  j  ) 

end  do 

msgtype=l 

cal 1  p vmf ini t  s  end ( pvmdef  ault , inf  o ) 
call  pvmf pack(integer4, i ,1,1, info) 
call  pvmf pack (real4,b_p,nk, 1 , info) 
call  pvmf pack (real4,a_p,nk, 1 , info) 
call  pvmf pack(real4,c_p,nk,l, info) 
call  pvmfpack(real4,d_p,nk, 1 , info) 
call  pvmf send( tids (i) ,msgtype, inf o) 
end  do 

{Broadcast  TIDS  to  all  Processors 

msgtype=2 

call  pvmf initsend(pvmdef ault , info) 
call  pvmf pack (integer4, tids ,16, 1, info) 
call  pvmf mcast (nprocs , t ids , msgtype , info ) 

{Construct  Arrow  System 

t ime 1 =dt ime ( t ar ay ) 

msgtype=4 

do  i=l, nprocs 

call  pvmf recv(-l, msgtype, inf o) 
call  pvmfunpack(integer4,iprocs, 1,1, info) 
if  (iprocs.eq.  1)  then  !For  i=l,2 

call  pvmfunpack(real4,pl, 1, 1 , inf o) 

call  pvmf unpack (real4 , al , 1 , 1 , info) 
call  pvmfunpack(real4,gl, 1, 1, info) 
call  pvmfunpack(real4,dl, 1,1, info) 
call  pvmfunpack(real4,f2, 1, 1, info) 
call  pvmf unpack (real4,a2, 1, 1, info) 

call  pvmf unpack (real4 , g2 , 1 , 1 , info ) 
call  pvmf unpack ( r eal4, d2 , 1,1, info) 
fhat(l)=pl 
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sdiat  (l)=al 
ghat(l)=gl 
dhat(l)=dl 
fhat(2)=f2 
ahat(2)=a2 
ghat(2)=g2 
dhat(2)=d2 

else  if  (iprocs .gt . 1 . cind, iprocs . It .nprocs)  then  !For  i=3,NPR0CS 
call  pvmf unpack (real4,ff , 1 , 1, info) 
call  pvmf unpack (real4,aa, 1,1, info) 
call  pvmf unpack(real4,gg, 1,1, info) 
call  pvmf unpack (real4,dd, 1, 1, info) 
fhat (iprocs+l)=ff 
ahat (iprocs+l)=aa 
ghat ( iprocs+1 ) =gg 
dhat (iprocs+l)=dd 

else  if  (iprocs . eq.nprocs)  then  ‘For  i=NPR0CS  +  1 

call  pvmf unpack (real4,ff ,1,1, info) 
call  pvmfunpack(real4,aa, 1,1, info) 
call  pvmf unpack ( real4 , c  c , 1 , 1 , info ) 
call  pvmf unpack (real4 ,dd ,1,1, info) 
fhat (npro  CS+ 1 ) =f f 
ahat (nprocs+l)=aa 
ghat(nprocs+l)=cc 
dhat ( npr o  c  s + 1 ) =dd 
endif 
end  do 

! Solve  Tridiagonal  Arrow  System 
call  tridiag_periodic (dhat , fhat , ahat , ghat ,nprocs+l) 

‘Store  Output  in  corresponding  vector 

x(l)=dhat(l) 
x(nk)=dhat (2) 
do  i=3,nprocs+l 

x((i’-l)*nk)=dhat(i) 
end  do 

‘Back  Substitute 

! Broadcast  TIDS  to  all  Processors 

msgtype=5 

call  pvmf initsend(pvmdefault, info) 
call  pvmfpack(real4,x,imax, l,info) 
call  pvmfmcast (nprocs , tids ,msgtype , info) 

! Receive  Local  Sol  from  Children 

msgtype=6 
do  i=l, nprocs 
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call  pvmfrecv(-l,msgtype,info) 

call  pvmf unpack ( integer4 , iprocs ,1,1, info) 

call  p vmf unpack (real4, XX, nk, 1 , info) 

print*, ^  Receiving  local  solution  from  Processor=  *, iprocs 
if  (iprocs . eq. 1)  then 
do  j=2,  nk  -  1 
x(j)=xx(j) 
end  do 
else 

do  j=l,  nk  -  1 

X ( ( iprocs - 1 ) *nk+ j ) =xx ( j ) 
end  do 
endif 
end  do 

t ime 2=dt ime ( t  ar ay ) 

write(*,'(‘'  Total  time  in  seconds  =  ” ,el2 .4)  Otaray(l)+taray(2) 

write(*,^("  Storing  Values  "  )0 
open ( 1 , f ile= ' parents . out  * ) 
do  i=l,npoin 

writeCl, '(i7,lx,el6.8) ’)i,x(i) 
end  do 
close(l) 

stop 

end 


♦This  is  the  CHILD  of  a  program  which  solves  the 

♦tridiagonal  system  A  x  =  b  with  periodic  boundary  conditions  using  a 

♦Tridiagonal  Decomposition  Method 

♦as  by  B.  Neta,  C.P.  Katti  and  F.X.  Giraldo. 

♦Programmed  by  F.X.  Giraldo  on  7/95 


program  childS 
include  ’fpvmS.h* 
include  ^paxam.h* 

! Global  Arrays 

dimension  a(nk) ,  b(nk),  c(nk),  d(nk),  x(imax),  xx(nk) 
dimension  f(nk),  g(nk) ,  p(nk) 
integer  t ids (16) 

! Start  PVM 

call  pvmfmytid(  mytid  ) 
call  pvmf parent (  mtid  ) 
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•Receive  Data  from  Parent 


msgtype=l 

call  pvmfrecv(mtid,msgtype,info) 
call  pvmfunpack(integer4,iprocs, 1,1, info) 
call  pvmfunpack(real4,b,nk,l,info) 
call  p vmf unpack (real4, a, nk, 1, info) 
call  p vmf unpack (real4,c,nk, l,info) 
call  pvmfunpack(real4,d,nk, 1, info) 

•Receive  Broadcast 

msgtype=2 

call  pvmfrecv(mtid,msgtype, inf o) 

call  pvmf unpack ( int eger4 , t ids ,16,1, inf o ) 

•Eliminate  b*s 

if  (iprocs .eq. 1)  then 
p(l)=b(l) 
do  j=2,  nk 

xl=b(j)/a(j-l) 
a(j)=a(j)  -  xl*c(j-l) 
p(j)=-xl*p(j-l) 
d(j)=d(j)  -  xl*d(j-l) 
end  do 
else 

f(l)=b(l) 
do  j=2,  nk 

xl=b(j)/a(j-l) 

a(j)=a(j)  -  xl*c(j-l) 
d(j)=d(j)  -  xl*d(j-l) 
end  do 
endif 

! Eliminate  c’s 

if  ( iprocs . eq . 1)  then 
g(nk-l)=c(nk-l) 
do  j=nk  -  2,  1,  -1 
xl=c(j)/a(j+l) 
g(j)=-xl*g(j+l) 

P(j)=p(j)  "  xl*p(j+l) 
d(j)=d(j)  -  xl*d(j+l) 
end  do 
else 

g(nk-l)=c(nk-l) 
do  j=nk  -  2,  1,  “1 
xl=c(j)/a(j+l) 
f(j)=f(j)  ”  xl*f(j+l) 
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g(j)=-xl*g(j+l) 
d(j)=d(j)  -  xl*d(j+l) 
end  do 
endif 

!Send  variables  from  i  to  i-l 

if  (iprocs .gt . 1)  then 
msgtype=3 

call  pvmf initsendCpvmdef ault , inf o) 
call  pvmf pack (real4,f(l) ,1,1, info) 
call  pvmf pack (real4,a(l) ,1,1, info) 
call  pvmfpack(real4,g(l) ,1,1, info) 
call  pvmfpack(real4,d(l) ,1,1, info) 
call  pvmfsend(tids(iprocs-l) ,msgtype, info) 
endif 

! Receive  variables  from  i+1  to  i 

if  (iprocs . It .nprocs)  then 
msgtype=3 

call  pvmfrecv(tids(iprocs+l) ,msgtype, info) 
call  pvmfunpack(real4,ff ,1 , 1 , info) 
call  pvmfunpack(real4,aa,i, l,info) 
call  pvmf unpack (real4, gg, 1, 1, info) 
call  pvmf unpack ( real4, dd, 1, 1, info) 

! Eliminate  C_nk^s 

xl=c(nk)/aa 
g(nk)=”Xl*gg 
a(nk)=a(nk)  -  xl*ff 
d(nk)=d(nk)  -  xl*dd 
endif 

!For  Processor  1  Only  —  Multiply  1st 
!Row  by  -(p_k/p.l)  and  add  to  Kth  Row 

if  (iprocs .eq, 1)  then 
xl=p(nk)/p(l) 
f(nk)=-xl*a(l) 
a(nk)=a(nk)  -  xl*g(l) 
d(nk)=d(nk)  -  xl*d(l) 
endif 

{Construct  Arrow  System 

msgtype=4 

call  pvmf initsendCpvmdef ault , inf o) 
call  pvmf pack ( integer4 , iprocs ,1,1, info) 

if  (iprocs . eq. 1)  then  !For  i=l,2 

call  pvmfpack(real4,p(l) ,1,1, info) 
call  pvmfpack(real4,a(l) ,1,1, info) 
call  pvmfpack(real4,g(l) ,1,1, info) 
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call  pvinfpack(real4,d(l)  ,1,1,  info) 
call  pvmfpack(real4,f (nk) ,1,1, info) 
call  pvmfpack(real4,a(nk) , 1, 1, info) 
call  pvmfpack(real4,g(nk) ,1,1, info) 
call  pvnifpack(real4,d(nk), 1,1, info) 
else  if  (iprocs.gt. 1. and. iprocs.lt .nprocs)  then  !For  i=3,NPR0CS 
call  pvmfpack(real4,f (nk) , 1 , l,info) 
call  pvmfpack(real4,a(nk) , 1, l,info) 
call  pvmfpack(real4,g(nk) , 1, l,info) 
call  pvmfpack(real4,d(nk) ,1, l,info) 
else  if  (iprocs .eq. nprocs)  then  !For  i=NPROCS  +  1 

call  pviiifpack(real4,f  (nk) ,  1, 1 , info) 
call  pvmfpack(real4,a(nk) , 1, l,inf o) 
call  pvmfpack(real4, c(nk) , 1 , l,inf o) 
call  pvmf pack(real4,d(nk) , 1, l,info) 
endif 

call  pvmf send(mtid,msgtype, info) 

IBack  Substitute 
{Receive  Arrow  Solution 

msgtype=5 

call  pvmf recv(mtid,msgtype, info) 
call  pvmfunpack(real4,x,imax, l,info) 

{Obtain  Local  Solution 

if  (iprocs .eq.  1)  then 
do  j=2,  nk  -  1 

xx(j)=(  d(j)  -  g(j)*x(nk)  -  p(j)*x(nk*nprocs)  )/a(j) 
end  do 
else 

do  j=l,  nk  -  1 

xx(j)=(  d(j)  -  f(j)*x((iprocs-l)*nk)  -  g(j)*x(iprocs*nk)  ) 

$  /a(j) 

end  do 
endif 

{Send  Local  Sol  to  Parent 

msgtype=6 

call  pvmf init send (pvmdefault, info) 
call  pvmf pack(integer4, iprocs, 1, 1, info) 
call  p vmf pack (real4, XX, nk, 1, info) 
call  pvmfsend(mtid,msgtype,info) 

stop 

end 

end 
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♦This  is  the  program  which  solves  the 

♦reduced  tridiagonal  system  A  x  =  b  with  periodic  boundary  conditions 
♦Programmed  by  F.X.  Giraldo  on  7/95 

- - 

subrout ine  tridiag_per iodic (d , b , a, c , n) 
include  *param.h^ 

dimension  b(nprocs+l) ,  a(nprocs+l),  c(nprocs+l),  d(nprocs+l) 
dimension  gam2(nprocs+l) 

a(l)=1.0/a(l) 
gam2(l)=~b(l)^a(l) 
b(l)=d(l)^a(l) 
do  i=2,n-l 

c(i-l)=c(i-l)^a(i-l) 
a(i)=1.0/(  a(i)  ^  b(i)^c(i-l)  ) 
gam2(i)=-b(i)^gam2(i-l)^a(i) 
b(i)=(  d(i)  -  b(i)^b(i-l)  )^a(i) 
end  do 

gam2(n-l)=gam2(n-l)  -  c(n-l)^a(n-l) 

d(n-l)=b(n-l) 
a(n-l)=gam2(n*-l) 
do  il=2,n-’l 
i=n-il 
i2=i+l 

d(i)=b(i)  -  c(i)^d(i2) 
a(i)=gam2(i)  -  c(i)^a(i2) 
end  do 

i=n 

zaa=d(i)  -  c(i)^d(l)  -  b(i)^d(n-l) 
zaa=zaa/(  a(i)  +  b(i)^a(n-l)  +  c(i)^a(l)  ) 
d(i)=zaa 
do  i=l,n-l 

d(i)=d(i)+a(i)^zaa 
end  do 

return 

end 
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Appendix  B 


In  this  appendix  we  give  the  pentadiagonal  solver.  The  first  code  is  the 
parent  (master)  followed  by  the  child  (slave).  The  pentadiagonal  solver  for  the 
reduced  system  is  also  given. 

^ - ♦ 

♦This  is  the  PARENT  of  a  program  which  solves  the 

♦pentadiagonal  system  A  x  =  b  with  periodic  boundary  conditions  using  a 

♦Pentadiagonal  Decomposition  Method 

♦as  by  B.  Neta,  C.P.  Katti  and  F.  X.  Giraldo. 

♦Programmed  by  F.X.  Giraldo  on  7/95 

- - - * 

program  parents 
include  *fpvm3.h* 
include  *param.h* 
c  external  timing^f gettod 

‘Global  Arrays 

dimension  s(imax),  r(imax),  a(imax),  b(imax),  c(imax) 
dimension  d(imeLx),  x(imax) 

dimension  s^p(nic),  r_p(nk) ,  a_p(nk),  b_p(nk) ,  c.p(nk) 
dimension  d_p(nk),  x_p(nk) 

dimension  flhat(nhat),  f2hat(nhat),  rhat(nhat) 
dimension  aihatCnhat),  chat(nhat),  glhat(nhat) 
dimension  g2hat(nhat),  dhat(nhat),  xhat(nhat) 
integer  tids(nprocs) ,  itimel(2),  itime2(2),  itotal 
real  tar ay (2) 

character  arch^S,  nodename^lO 

ichild=0 

npoin=nprocs^nk 

do  i=l,npoin  ‘Construct  Linear  System 

s(i)=l 
r(i)=l 
a(i)=4 
c(i)=l 
b(i)=l 
.d(i)=8+i 
end  do 

d(l)=2^npoin  +  8 
d(2)=npoin  +  16 
d(npoin-l)=7^npoin  -  8 
d(npoin)=6^npoin 
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‘Start  PVM 


call  pvmfmytidC  mytid  ) 
call  pvmf parent (  iptid  ) 
nodename=  ^  childS . exe  * 
arch=^*  * 

! Spawn  Processors  and  distribute 
!the  Linear  System  among  processors 

do  i=l,nprocs 

call  pvmf spawn(nodename,pvmdefault, arch, l,tids(i) ,ierr) 
write(*,^(”  Spawning  Processor  tids  ",i2,lx,il0) Oi,tids(i) 
if  (ierr.ne.l)  then 

write(*, '("ierr  =  '*,i3)*)ierr 

write(*, ^ (“Error !  Could  not  spawn  process  #  “,i3)*)i 
call  pvmf exit (ierr) 
stop 
end  if 
do  j=l,nk 

s^p(j)=s(  (i-l)*nk  +  j  ) 

r_p(j)=r(  (i-l)*nk  +  j  ) 

a_p(j)=a(  (i-l)*nk  +  j  ) 

c_p(j)=c(  (i“l)*nk  +  j  ) 

b_p(j)=b(  (i-l)*nk  +  j  ) 

d_p(j)=d(  (i-l)*nk  +  j  ) 

end  do 

msgtype=l 

call  pvmf init send (pvmdefau It , info) 
call  pvmf pack ( int eger4 , i , 1 , 1 , inf o ) 
call  p vmf pack ( real4, s_p, nk, 1, info) 
call  pvmf pack (real4, r_p, nk, 1, info) 
call  p vmf pack ( real4, a_p, nk, 1, info) 
call  pvmfpack(real4,c_p,nk, Ijinfo) 
call  pvmf pack (real4 , b_p , nk , 1 , inf o ) 
call  pvmf pack (real4 , d_p , nk , 1 , inf o ) 
call  pvmf send(tids(i) ,msgtype, info) 
end  do 

•Broadcast  TIDS  to  all  Processors 

c  call  t iming^f gett od (•/.REF (  it ime  1 ) ) 

msgtype=2 

call  pvmf initsend(pvmdefault, info) 

call  pvmf pack( integer4 , t ids ,nprocs , 1 , inf o ) 

call  pvmf mcas t (npro cs , tids , ms gtype, info) 

•Construct  Arrow  System 

t ime 1 =dt ime ( t ar ay ) 


22 


msgtype=4 
do  i=l,nprocs 

call  pvmfrecv(-l,msgtype,iiifo) 

call  pvmf unpack ( integer4 , iprocs ,1,1, info) 

if  (iprocs . eq. 1)  then 

call  pvmf unpack(real4,pl_l ,1,1, info) 

call  pvmfunpack(real4 ,p2_l ,1,1, info) 

call  pvmf unpack (real4,a_l ,1,1, info) 

call  pvmf unpack (real4,c_l ,1,1, info) 

call  pvmfunpack(real4,gl_l ,1,1, info) 

call  pvmfunpack(real4,g2.1 ,1,1, info) 

call  pvmfunpack(real4,d_l,l, l,info) 

call  pvmf unpack (real4,p2_2, 1 , 1 , info) 

call  pvmfunpack(real4,r_2,l, l,info) 

call  pvmf unpack (real4,a_2, 1, 1 , info) 

call  pvmf unpack(real4,gl_2 ,1,1, info) 

call  pvmf unpack (real4,g2_2 ,1,1, info) 

call  pvmfunpack(real4,d_2, 1, 1 , info) 

call  pvmfunpack(real4,f 1.3, 1 , 1 , info) 

call  pvmf unpack (real4,f2_3 ,1,1, info) 

call  pvmf unpack ( r eal4, a.3, 1, 1, info) 

call  pvmfunpack(real4,c.3, 1, 1, info) 

call  pvmf unpack (real4,g 1.3 ,1,1, info) 

call  pvmf unpack(real4,g2.3 ,1,1, info) 

call  pvmfunpack(real4,d_3,l, l,info) 

call  pvmf unpack (r eal4, f 1_4,1, 1, info) 

call  pvmfunpack(real4,f2.4, 1, 1 ,info) 

call  pvmfunpack(real4,r_4, 1 , 1, info) 

call  pvmfunpack(real4,a.4,l, Ijinfo) 

call  pvmf unpack (real4 ,gl_4, 1 , 1 , info) 

call  pvmf unpack (r eal4, g2.4, 1, 1, info) 

call  pvmfunpack(real4,d_4, 1, 1, inf o) 

plhat=pl.l 

p2hat=p2.1 

ahat(l)=a.l 

chat(l)=c.l 

glhat(l)=gl.l 

g2hat(l)=g2_l 

dhat(l)=d.l 

q2hat=p2.2 

rhat(2)=r.2 

ahat(2)=a_2 

glhat(2)=gl.2 

g2hat(2)=g2.2 


!For  i=l,2 
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dhat(2)=d_2 

f lhat(2*iprocs+l)=f 1_3 
f 2hat (2*iprocs+l)=f2_3 

ahat(2*iprocs+l)=a_3 
chat (2*iprocs+l)=c.3 
glhat(2*iprocs+l)=gl_3 
g2hat(2*iprocs+l)=g2_3 
dhat (2* iprocs+1 ) =d_3 
f lhat (2*iprocs+2 ) =f 1.4 
f2hat(2*iprocs+2)=f2.4 
rhat (2*iprocs+2)=r_4 
ahat (2*iprocs+2)=a_4 
glhat(2*iprocs+2)=gl.4 
g2hat(2*iprocs+2)=g2_4 
dhat (2* iprocs+2 ) =d_4 

else  if  (iprocs .gt . 1 . aaid. iprocs . It .nprocs)  then  !For  i=3,NPR0CS 
call  pvmf unpack (real4 ,fl_l ,1,1, info) 
call  pvmf unpack (real4,f2_l , 1,1, info) 
call  pvmf unpack (real4, a. 1 , 1, 1, info) 
call  pvmf unpack (real4,c_l ,1,1, info) 
call  pvmfunpack(real4,gl.l,l, 1, info) 
call  pvmf unpack (real4,g2.1, 1 , 1, info) 
call  pvmf unpack ( r eal4 , d_ 1 , 1 , 1 , inf o ) 
call  pvmf unpack (real4 ,f 1.2 ,1,1, info) 
call  pvmfunpack(real4,f2_2, 1,1, info) 
call  pvmf unpack (real4 , r.2 ,1,1, inf o ) 
call  pvmf unpack (real4,a.2, 1, 1 , info) 
call  pvmfunpack(real4,gl.2, 1, l,info) 
call  pvmf unpack (real4 , g2.2 ,1,1, inf o ) 
call  pvmfunpack(real4,d.2,l, l,info) 
f lhat(2*iprocs+l)=f 1.1 
f2hat (2*iprocs+l)=f2.1 

ahat (2*iprocs+l)=a.l 
chat (2*iprocs+l)=c.l 
glhat(2*iprocs+l)=gl.l 
g2hat ( 2* iprocs+ 1 ) =g2. 1 
dhat (2*iprocs+l)=d.l 
f lhat (2*iprocs+2)=f 1.2 
f2hat(2*iprocs+2)=f2.2 
rhat ( 2* ipr o  c  s +2 ) =r.2 
ahat (2*iprocs+2)=a.2 
glhat(2*iprocs+2)=gl.2 
g2hat(2*iprocs+2)=g2.2 
dhat (2*iprocs+2)=d.2 
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!For  i=NPROCS  +  1 


else  if  (iprocs . eq.nprocs)  then 

call  pvmfunpack(real4,f 1_1, 1 , 1 , info) 
call  pvmf unpack (real4,f2_l , 1,1, info) 
call  pvmf unpack (real4,a_l , 1, 1, inf o) 
call  pvmfunpack(real4,c_l, 1, 1, info) 
call  pvmfunpack(real4,b_l , 1 , l,info) 
call  pvmfunpack(real4,d_l , 1, 1, info) 
call  pvmfunpack(real4,f 1_2, 1 , 1 , info) 
call  pvmf unpack (real4,f2_2, 1 , 1 , info) 
call  pvmf unpack (real4 , r_2 ,1,1, inf o ) 
call  pvmf unpack (real4,a_2 ,1,1, info) 
call  pvmf unpack ( real4 , c_2 ,1,1, inf  o ) 
call  pvmf unpack(real4, b_2, 1, 1, inf o) 
call  pvmf unpack (real4,d_2, 1, 1, inf o) 
f lhat(2*iprocs+l)=f 1^1 
f2hat (2*iprocs+l)=f 2_1 

ahat (2*iprocs+l)=a_l 
chat (2*iprocs+l)=c_l 
vlhat=b_l 

dhat (2*iprocs+l ) =d_l 

f lhat (2*iprocs+2)=f 1_2 

f2hat(2*iprocs+2)=f2.2 

rhat (2*iprocs+2)=r_2 

ahat (2* ipr ocs+2 ) =a_2 

ulhat=c_2 

u2hat=b_2 

dhat (2* iprocs+2 ) =d_2 
endif 
end  do 


! Solve  Tridiagonal  Arrow  System 
call  pent ad iag_per iodic (f lhat , f 2hat , rhat , ahat , chat , glhat , 

$  g2hat , p lhat , p2hat , q2hat , v lhat , u lhat , u2hat , dhat , xhat ) 


! Store  Output  in  corresponding  vector 

x(l)=xhat(l) 
x(2)=xhat(2) 
do  i=l,nprocs 

x(i*nk-l)=xhat (2*i+l) 
x(i*nk)=xhat(2*i+2) 
end  do 

IBack  Substitute 

! Broadcast  TIDS  to  all  Processors 

msgtype=5 

call  pvmf init send (pvmdefault, inf o) 
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call  pvmfpack(real4,x,imax,l,info) 
call  pvmfmcast(nprocs ,tids ,msgtype, info) 

•Receive  Local  Solution  from  Children 

msgtype=6 
do  i=l,nprocs 

call  pvmfrecv(-l ,rasgtype,inf o) 
call  pvmf unpack (integer4 , iprocs ,1,1, info) 
call  pvmfunpack(real4,x_p,nk, 1, inf o) 
call  pvmf unpack ( int  eger4 , it ime ,1,1, inf o ) 
ichild=ichild  +  itime 

print*, ^  Receiving  solution  Processor  =  *, iprocs 
if  (iprocs. eq. 1)  then 
do  j=3,  nk  -  2 
x(j)=x_p(j) 
end  do 
else 

do  j=l,  nk  -  2 

x((iprocs-l)*nk+j )=x_p(j) 
end  do 
endif 
end  do 

c  call  t iming_f gett od (*/,REF (  it ime2 ) ) 

c  iparent= (it ime2(l)-it ime 1 ( 1) )* 1000000  +  itime2(2)~itimel(2) 

c  print*, ^  Children  Time  in  useconds=  ’,ichild 

c  print*,*  Parent  Time  in  useconds  =  *,iparent 

c  print*,*  Total  Time  in  useconds  =  * , ichild+iparent 

t ime2=dt ime (tar ay ) 

write(*,*(”  Total  time  in  seconds  =  " ,el2 .4) * )taray(l)+taray(2) 

write(*,*("  Storing  Values  "  )*) 
open(l,f ile=* parents. out  * ) 
do  i=l,npoin 

writed,  *  (i7,  lx,el2.6)  *  )i,x(i) 
end  do 
close(l) 

stop 

end 


*This  is  the  CHILD  of  a  program  which  solves  the 

*pentadiagonal  system  A  x  =  b  with  periodic  boundary  conditions  using  a 
*Pentadiagonal  Decomposition  Method 


26 


*as  by  B.  Neta,  C.P.  Katti  amd  F.  X.  Giraldo 
♦Programmed  by  F.X.  Giraldo  on  7/95 

- - 

program  childS 
include  ^fpvmS.h’ 
include  ’param.h’ 
c  external  timing_fgettod 

! Global  Arrays 

dimension  s(nk),  r(nk),  a(nk) ,  c(nk),  b(nk) ,  d{nk) 
dimension  xg(imax),  x(nk) 

dimension  fl(nk) ,  f2(nk),  gl(nk),  g2(nk),  pl(nk),  p2(nk) 
integer  tids(nprocs) ,  itimel(2),  itime2(2),  itotal 

! Start  PVM 

call  pvmfmytidC  mytid  ) 
call  pvmf parent (  mtid  ) 

•Receive  Data  from  Parent 


msgtype=l 

call  pvmfrecv(mtid,msgtype,info) 
call  pvmf unpack ( int eger4 , ipro  cs ,1,1, inf  o ) 
call  pvmfunpack(real4,s ,nk, 1, inf o) 
call  pvmfunpack(real4,r,nk, Ijinfo) 
call  pvmf unpack (real4, a, nk, 1, info) 
call  pvmfunpack(real4,c,nk, 1, info) 
call  p vmf unpack ( r eal4, b, nk, 1, info) 
call  pvmfunpack(real4,d,nk, 1, info) 

•Receive  Broadcast 


c  call  timing_fgettod(y,REF(itimel)) 

msgtype=2 

call  pvmfrecv(mtid,msgtype,info) 

call  pvmf unpack (int eger4,t ids ,nprocs , 1, inf o) 

! Eliminate  R  and  S 


if  (iprocs.eq.  1)  then 
pl(l)=s(l) 
p2(l)=r(l) 
p2(2)=s(2) 

do  j=2,nk“l 

xl=r(j)/a(j-l) 
a(j)=a(j)  -  xl*c(j“l) 
c(j)=c(j)  *-  xl*b(j-i) 
pl(j)=pl(j)  -  xl*pl(j-l) 
p2(j)=p2(j)  -  xl*p2(j-l) 
d(j)=d(j)  -  xl*d(j-l) 
xl=s(j+l)/a(j-l) 
r(j+l)=r(j+l)  ~  xl*c(j-l) 


* 
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a(j+l)=a(j+l)  -  xl*b(j-l) 
pl(j+l)=pl(j+l)  -  xl*pl(j-l) 
p2(j+l)=p2(j+l)  -  xl*p2(j-l) 
d(j+l)=d(j+l)  -  xl*d(j-l) 
end  do 
else 
fl(l)=s(l) 
f2(l)=r(l) 
f2(2)=s(2) 

do  j=2,nk-l 

xl=r(j)/a(j-l) 
fl(j)=fl(j)  -  xl*fl(j-l) 
f2(j)=f2(j)  -  xl*f2(j-l) 
a(j)=a(j)  -  xl*c(j-l) 
c(j)=c(j)  -  xl*b(j-l) 
d(j)=d(j)  -  xl*d(j-l) 
xl=s(j+l)/a(j-l) 
fl(j+l)=fl(j+l)  -  xl*fl(j-l) 
f2(j+l)=f2(j+l)  -  xl*f2(j-l) 
r(j+l)=r(j+l)  -  xl*c(j-l) 
a(j+l)=a(j+l)  -  xl*b(j-l) 
d(j+l)=d(j+l)  -  xl*d(j-l) 
end  do 
end  if 


! Eliminate  C  and  B 


gl(nk-3)=b(nk-3) 
gl(nk-2)=c(nk-2) 
g2(nk-2)=b(nk-2) 
if  (iprocs.eq.l)  then 
do  j=nk  -  3,1,  -1 
xl=c(j)/a(j+l) 
gl(j)=gl(j)  -  xl*gl(j+l) 
g2(j)=g2(j)  -  xl*g2(j+l) 
pl(j)=pl(j)  -  xl*pl(j+l) 
p2(j)=p2(j)  -  xl*p2(j+l) 
d(j)=d(j)  -  xl*d(j+l) 
if  (j.gt.l)  then 

xl=b(j-l)/a(j+l) 
gl(j-l)=gl(j-l)  -  xl*gl(j+l) 
g2(j-l)=g2(j-l)  -  xl*g2(j+l) 
pl(j-l)=pl(j-l)  -  xl*pl(j+l) 
p2(j-l)=p2(j-l)  -  xl*p2(j+l) 
d(j-l)=d(j-l)  -  xl*d(j+l) 
endif 
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end  do 
else 

do  j=nk  -  3,1,  -1 
xl=c(j)/a(j+l) 
fl(j)=fl(j)  -  xl*fl(j+l) 
f2(j)=f2(j)  -  xl*f2(j+l) 
gl(j)=gl(j)  -  xl*gl(j+l) 
g2(j)=g2(j)  -  xl+g2(j+l) 
d(j)=d(j)  -  xl*d(j+l) 
if  (j .gt. 1)  then 
xl=b(j-l)/a(j+l) 
fl(j-l)=fl(j-l)  -  xl*fl(j+l) 
f2(j-l)=f2(j-l)  -  xl*f2(j+l) 
gl(j-l)=gl(j-l)  -xl*gl(j+l) 
g2(j-l)=g2(j-l)  -xl*g2(j+l) 
d(j-l)=d(j-l)  -  xl*d(j+l) 
endif 
end  do 
endif 

I  Send  variables  from  i  to  i-1 

if  ( iprocs . gt . 1 )  then 
msgtype=3 

call  pvmf initsend(pvmdef ault , info) 
call  pvmfpack(real4,f 1(1) , 1, 1, info) 
call  pvmfpack(real4,f2(l) , 1, l,info) 
call  pvmfpack(real4,a(l) ,1,1, info) 
call  pvmf pack (real4,gl( 1) , 1, 1 , info) 
call  pvmf pack (real4 , g2 ( 1 ) , 1 , 1 , inf  o ) 
call  pvmfpack(real4,d(l) , 1 , 1 , info) 
call  pvmf pack(real4,f 1(2) ,1,1, info) 
call  pvmf pack(real4,f 2(2) , 1, l,info) 
call  pvmfpack(real4,a(2) ,1,1, info) 
call  pvmfpack(real4,gl(2) , 1, l,info) 
call  pvmfpack(real4,g2(2) , 1, 1, info) 
call  pvmfpack(real4,d(2) ,1,1, info) 
call  pvmfsend(tids(iprocs-l) ,msgtype,info) 
endif 

IReceive  variables  from  i+1  to  i 

if  (iprocs.lt .nprocs)  then 
msgtype=3 

call  pvmfrecv(tids(iprocs+l) ,msgtype, info) 
call  pvmfunpack(real4,f 1„1, 1, l,info) 
call  pvmf unpack(real4,f 2_1 ,1,1, info) 
call  pvmf unpack(real4,a_l, 1,1, info) 
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call  pvmf unpack(real4,gl_l ,1,1, info) 
call  pvmf unpack(real4,g2_l ,1,1, info) 
call  pvmfunpack(real4,d.l, 1, l,info) 
call  pvmfunpack(real4,f 1.2, 1, l,info) 
call  pvmfunpack(real4,f 2_2, 1, l,info) 
call  pvmfunpack(real4,a.2, 1 , 1 , info) 
call  pvmf unpack (real4,gl_2 ,1,1, info) 
call  pvmf unpack (real4,g2.2, 1 , 1 , info) 
call  pvmfunpack(real4,d.2, 1,1, info) 

{Eliminate  c.(nk),  b.(nk)  and  b_(nk-l) 

xl=c(nk)/a.l 
r(nk)=r(nk)  -  xl*fl_l 
a(nk)=a(nk)  ~  xl*f2.1 
gl(nk)=-xl*gl_l 
g2(nk)=-xl*g2_l 
d(nk)=d(nk)  -  xl*d.l 
xl=b(nk)/a.2 
r(nk)=r(nk)  -  xl*fl.2 
a(nk)=a(nk)  -  xl*f2_2 
gl(nk)=gl(nk)  -  xl*gl.2 
g2(nk)=g2(nk)  -  xl*g2.2 
d(nk)=d(nk)  -  xl*d_2 
xl=b(nk-l)/a_l 
a(nk-l)=a(nk-l)  -  xl*fl_l 
c(nk-l)=c(nk-l)  -  xl*f2_l 
gl(nk~l)=-xl*gl_l 
g2(nk-l)=-xl*g2_l 
d(nk-l)=d(nk-l)  -  xl*d_l 
endif 

!For  Processor  1  Only 

if  (iprocs . eq.  1)  then 
c(l)=0.0 

•Eliminate  P1.2 

xl=pl(2)/pl(l) 
r(2)=-xl*a(l) 
gl(2)=gl(2)  -  xl*gl(l) 
g2(2)=g2(2)  -  xl*g2(l) 
p2(2)=p2(2)  -  xl*p2(l) 
d(2)=d(2)  -  xl*d(l) 

•Eliminate  Pl.nk~l 

xl=pl(iik-l)/pl(l) 
fl(nk-l)=-xl*a(l) 
a(nk-l)=a(nk-l)  -  xl*gl(l) 
c(nk-l)=c(iik-l)  -  xl*g2(l) 
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p2(nk-l)=p2(nk-l)  -  xl*p2(l) 
d(nk-l)=d(nk-l)  ~  xl*d(l) 

! Eliminate  Pl_nk 

xl=pl(nk)/pl(l) 
f l(nk)=-xl*a(l) 
r(nk)=r(nk)  -  xl*gl(l) 
a(nk)=a(nk)  -  xl*g2(l) 
p2(nk)=p2(nk)  -  xl*p2(l) 
d(nk)=d(nk)  -  xl*d(l) 

! Eliminate  P2_nk-1 

xl=p2 (nk- 1 ) /p2 ( 2 ) 
fl(nk~l)=fl(nk-l)  -  xl*r(2) 
f2(nk-l)=-xl*a(2) 
a(nk-l)=a(nk-l)  -  xl*gl(2) 
c(nk-l)=c(nk-l)  -  xl*g2(2) 
d(nk-l)=d(nk-l)  -  xl*d(2) 

{Eliminate  P2_nk 

xl=p2(nk)/p2(2) 
fl(nk)=fl(nk)  -  xl*r(2) 
f2(nk)=-xl*a(2) 
r(nk)=r(nk)  -  xl*gl(2) 
a(nk)=a(nk)  -  xl*g2(2) 
d(nk)=d(nk)  -  xl*d(2) 
endif 

{Construct  Arrow  System 

msgtype=4 

call  pvmf init send (pvmdef ault , info) 
call  pvmfpack(integer4 , iprocs ,1,1, info) 

if  (iprocs . eq.  1)  then  !For  i=l,2 

call  pvmf pack(real4 , pi ( 1 ) , 1 , 1 , inf o ) 
call  pvmf pack (real4,p2(l) , 1,1, info) 
call  pvmf pack (real4,a(l) ,1, 1, info) 
call  pvmfpack(real4,c(l) ,1,1, info) 
call  pvmfpack(real4,gl(l) , 1, l,info) 
call  pvmf pack (real4 , g2 ( 1 ) , 1 , 1 , inf o ) 
call  pvmf pack (real4,d(l) , 1 , 1 , info) 
call  pvmfpack(real4,p2(2) , 1, l,info) 
call  pvmfpack(real4,r(2) ,1,1, info) 
call  pvmf pack (real4, a(2) ,1 , 1, info) 
call  pvmfpack(real4,gl(2) , 1 , l,info) 
call  pvmfpack(real4,g2(2) , 1, l,info) 
call  pvmfpack(real4,d(2) ,1,1, info) 
call  pvmfpack(real4,fl(nk-l), 1,1, info) 
call  pvmfpack(real4,f2(nk-l) , 1,1, inf o) 
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call  pvinfpack(real4,a(nk-l) ,  1,1,  info) 
call  pvmfpack(real4, c(nk~l) ,1,1, info) 
call  pvmf pack (real4,gi(nk-l) ,1,1, info) 
call  pvmf pack (real4,g2(nk-l) ,1,1, info) 
call  pvmfpack(real4,d(nk-l) ,1, 1, info) 
call  pvmf pack (real4,fl(nk) ,1,1, info) 
call  pvmfpack(real4,f2(nk) ,1,1, info) 
call  pvmf pack (real4,r(nk) , 1 , 1, info) 
call  pvmfpack(real4,a(nk) , 1 , 1, info) 
call  pvmf pack (real4,gl(nk) ,1, l,info) 
call  pvmfpack(real4,g2(nk) , 1, l,info) 
call  pvmf pack (real4,d(nk) ,1,1, info) 
else  if  (iprocs.gt. 1. and. iprocs.lt. nprocs)  then  !For  i=3,NPR0CS 
call  pvmfpack(real4,f l(nk-l) ,1,1, info) 
call  pvmfpack(real4,f2(nk-l) ,1,1, info) 
call  pvmfpack(real4,a(nk-l) ,1,1, info) 
call  pvmfpack(real4,c(nk-l) , 1 , l,info) 
call  pvmfpack(real4,gl(nk“l) ,1,1, info) 
call  pvmfpack(real4,g2(nk-l) ,1,1, info) 
call  pvmf pack (real4,d(nk-l) ,1,1, info) 
call  pvmfpack(real4,f l(nk) ,1,1, info) 
call  pvmfpack(real4,f2(nk) , 1 , 1, inf o) 
call  pvmfpack(real4,r(nk) , 1, l,inf o) 
call  pvmf pack (real4,a(nk) ,1,1, info) 
call  pvmfpack(real4,gl(nk) , 1 , 1, info) 
call  pvmf pack (real4,g2(nk) , 1 , 1, info) 
call  pvmfpack(real4,d(nk) , 1 , l,info) 
else  if  (iprocs .eq.nprocs)  then  !For  i=NPROCS+l 

call  pvmf pack (real4,fl(nk-l) ,1,1, info) 
call  pvmfpack(real4,f2(nk-l) ,1,1, info) 
call  pvmf pack(real4,a(nk-l) ,1,1, info) 
call  pvmfpack(real4,c(nk-l) ,1,1, info) 
call  pvmf pack (real4 , b (nk-1 ) ,1,1, inf o ) 
call  pvmfpack(real4,d(nk-l) ,1,1, info) 
call  pvmf pack (real4, f l(nk) , 1 , 1, inf o) 
call  pvmfpack(real4,f2(nk) ,1,1, info) 
call  pvmf pack (real4, r (nk) , 1 , 1 , inf o) 
call  pvmfpack(real4,a(nk), 1,1, info) 
call  pvmfpack(real4,c(nk) , 1 , l,info) 
call  pvmfpack(real4,b(nk) , 1, l,info) 
call  pvmfpack(real4,d(nk) , 1, l,info) 
endif 

call  pvmf send(mtid,msgtype, info) 

!Back  Substitute 
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IReceive  Arrow  Solution 


msgtype=5 

call  pvmfrecv(mtid,msgtype,info) 
call  pvmf unpack (real4 , xg , imax , 1 , info ) 

! Obtain  Local  Solution 

if  (iprocs . eq.  1)  then 
do  j=3,  nk  “  2 

x(j)=(  d(j)  -  gl(j)*xg(nk-l)  -  g2(j)*xg(nk) 

$  -  pl(j)*xg(nk*nprocs-l)  -  p2(j)*xg(nk*nprocs)  )/a(j) 

end  do 
else 

do  j=l,  nk  -  2 

x(j)=(  d(j)  -  fl(j)*xg((iprocs-l)*nk-l) 

$  -  f2(j)*xg((iprocs-l)*nk)  -  gl( j )*xg(iprocs*nk-l) 

$  -  g2(j)*xg(iprocs*nk)  )/a(j) 

end  do 
endif 

c  call  timing_fgettod(*/,REF(itime2)) 

c  itotal=(itime2(l)-itimel(l))*1000000  +  itime2(2)-itimel(2) 

!Send  Local  Solution  to  Parent 

msgtype=6 

call  pvmf init send (pvmdefault, info) 
call  pvmf pack ( integer4 , iprocs ,1,1, info) 
call  pvmfpack(real4,x,nk, l,inf o) 
call  pvmfpack(integer4,itotal,l, Ijinfo) 
call  pvmfsend(mtid,msgtype,info) 

stop 

end 

♦ - - * 

♦This  program  solves  a 

♦smaller  pentadiagonal  system  A  x  =  b  with  periodic  boundary  conditions 
♦using  a  matrix  partitioning  approach. 

♦Written  on  7/95,  by  F,  X.  Giraldo 


♦ - - 

subroutine  pentadiag_per iodic (f lhat , f 2hat , rhat , ahat , chat , glhat , 

$  g2hat , p lhat , p2hat , q2hat , v lhat , ulhat , u2hat , dhat , xhat ) 


include  ^param.h* 

dimension  f lhat (nhat) ,  f2hat(nhat),  rhat(nhat),  ahat(nhat) 
dimension  chat(nhat),  glhat(nhat),  g2hat(nhat),  dhat(nhat) 
dimension  xhat(nhat),  d2hat(nhat),  d3hat(nhat) 
dimension  xlhat(nhat),  x2hat(nhat),  x3hat(nhat) 

! Forward  Reduction 
! Forward  Reduction 
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d2hat(l)=-plhat 
d2hat (nhat-3)=-glhat (nhat-3) 
d2hat (nhat-2 ) =-glhat (nhat-2 ) 
d3hat ( 1 ) =-p2hat 
d3hat(2)=~q2hat 
d3hat(nhat~3)=-g2hat (nhat-3) 
d3hat ( nhat-2 ) =-g2hat (nhat-2 ) 

do  i=l,nhat-5,2 

! Eliminate  rhat_i+l 

xl=rhat ( i+ 1 ) /ahat ( i ) 
ahat(i+l)=ahat(i+l)  ~  xl*chat(i) 
glhat(i+l)=glhat(i+l)  -  xl*glhat(i) 
g2hat(i+l)=g2hat(i+l)  -  xl*g2hat(i) 
dhat(i+l)=dhat(i+l)  -  xl*dhat(i) 
d2hat(i+l)=d2hat(i+l)  -  xl*d2hat(i) 
d3hat(i+l)=d3hat(i+l)  -  xl*d3hat(i) 

! Eliminate  flhat_i+2 

xl=f lhat ( i+2 ) /ahat ( i ) 
f2hat (i+2)=f2hat(i+2)  -  xl*chat(i) 
ahat ( i+2 )=ahat( i+2)  -  xl*glhat(i) 
chat ( i+2 )=chat( i+2)  -  xl*g2hat(i) 
dhat(i+2)=dhat (i+2)  -  xl*dhat(i) 
d2hat(i+2)=d2hat(i+2)  -  xl*d2hat(i) 
d3hat(i+2)=d3hat(i+2)  -  xl*d3hat(i) 

{Eliminate  flhat_i+3 

xl=f lhat ( i+3 ) /ahat ( i ) 
f2hat(i+3)=f2hat(i+3)  -  xl*chat(i) 
rhat (i+3)=rhat(i+3)  -  xl*glhat(i) 
ahat ( i+3 )=ahat( i+3)  -  xl*g2hat(i) 
dhat(i+3)=dhat(i+3)  -  xl*dhat(i) 
d2hat(i+3)=d2hat(i+3)  -  xl*d2hat(i)  • 
d3hat(i+3)=d3hat(i+3)  -  xl*d3hat(i) 

{Eliminate  f2hat_i+2 

xl=f 2hat ( i+2 ) /ahat ( i+1 ) 
ahat ( i+2 )=ahat( i+2)  -  xl*glhat (i+1) 
chat ( i+2 )=chat( i+2)  -  xl*g2hat (i+1) 
dhat(i+2)=dhat(i+2)  -  xl*dhat(i+l) 
d2hat(i+2)=d2hat(i+2)  -  xl*d2hat (i+1) 
d3hat(i+2)=d3hat(i+2)  -  xl*d3hat (i+1) 

{Eliminate  f2hat_i+3 

xl=f 2hat ( i+3 ) /ahat ( i+1 ) 
rhat(i+3)=rhat(i+3)  -  xl*glhat (i+1) 
ahat ( i+3 )=cLhat( i+3)  -  xl*g2hat (i+1) 
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dhat(i+3)=dhat(i+3)  -  xl*dhat(i+l) 
d2hat(i+3)=d2hat(i+3)  -  xl*d2hat(i+l) 
d3hat(i+3)=d3hat(i+3)  -  xl*d3hat (i+1) 
end  do 

! Eliminate  rhat_nhat-3 

i=nhat  -  3 

xl=rhat ( i+1 ) /ahat ( i) 
ahat(i+l)=ahat(i+l)  ~  xl*chat(i) 
glhat(i+l)=glhat(i+l)  -  xl*glhat(i) 
g2hat(i+l)=g2hat(i+l)  -  xl*g2hat(i) 
dhat(i+l)=dhat(i+l)  -  xl*dhat(i) 
d2hat(i+l)=d2hat(i+l)  -  xl*d2hat(i) 
d3hat(i+l)=d3hat(i+l)  -  xl*d3hat(i) 

IBack  Substitution 
!Back  Substitution 
•Solve  for  NHAT-2  and  NHAT-3 

i=nhat"2 

ai=l/ahat(i) 

X lhat ( i ) =ai *dhat ( i ) 
x2hat ( i ) =ai*d2hat ( i) 
x3hat ( i ) =ai*d3hat ( i ) 
i=nhat~3 
ai=l/aLhat  (i) 

xlhat(i)=ai*(  dhat(i)  -  chat(i)*xlhat(i+l)  ) 
x2hat(i)=ai*(  d2hat(i)  -  chat (i)*x2hat( i+1)  ) 
x3hat(i)==ai*(  d3hat(i)  -  chat (i)*x3hat( i+1)  ) 

! Solve  for  I=NHAT-4, ... ,1 

do  i=nhat-5, 1,“2 
k=i+l 

ai=l/ahat(k) 

xlhat (k)=ai* (dhat (k) -glhat (k) *xlhat (k+l)-g2hat (k) ♦xlhat (k+2) ) 
x2hat(k)=ai*(d2hat(k)-glhat(k)*x2hat(k+l)-g2hat(k)*x2hat(k+2)) 
x3hat (k)=ai* (dShat (k)-glhat (k)*x3hat (k+l)-g2hat (k)*x3hat (k+2) ) 
k=i 

ai=l/aLhat(k) 

xlhat (k)=ai*(dhat(k)-chat(k)*xlhat(k+l)-glhat(k)*xlhat(k+2)*“ 

$  g2hat (k ) ♦xlhat (k+3 ) ) 

x2hat(k)=ai*(d2hat(k)-chat(k)+x2hat(k+l)-glhat(k)^x2hat(k+2)- 
$  g2hat(k)+x2hat(k+3)) 

x3hat(k)=ait(d3hat(k)-chat(k)+x3hat(k+l)-glhat(k)+x3hat(k+2)- 
$  g2hat(k)+x3hat(k+3)) 

end  do 

! Solve  for  NHAT-1  and  NHAT 


i=nhat-l 
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aal=vlhat*x2hat(l)  +  f lhat (i)*x2hat (i-2)  +  f 2hat (i)*x2hat(i-l)  + 

$  ahat ( i ) 

ccl=vlhat*x3hat (1)  +  f lhat (i)*x3hat (i-2)  +  f 2hat (i)*x3hat(i-l)  + 

$  chat(i) 

ddl=dhat(i)  -  vlhat+xlhat (l)  -  f lhat (i)*xlhat (i-2)  - 
$  f 2hat (i)*xlhat (i-1) 

i=nhat 

bb2=ulhat*x2hat(l)  +  u2hat*x2hat (2)  +  f lhat(i)*x2hat (i-3)  + 

$  f 2hat(i)*x2hat(i-2)  +  rhat(i) 

aa2=ulhat*x3hat (1)  +  u2hat*x3hat(2)  +  f lhat (i)*x3hat (i~3)  + 

$  f 2hat(i)*x3hat(i-2)  +  ahat(i) 

dd2=dhat(i)  -  ulhat*xlhat(l)  -  u2hat*xlhat (2)  - 
$  f lhat (i) *x lhat (i“3)  -  f 2hat (i) *xlhat (i-2) 

xl=bb2/aal 
aa2=aa2  -  xl*ccl 
dd2=dd2  -  xl*ddl 
xhat (nhat)=dd2/aa2 

xhat(nhat-l)=(  ddl  -  ccl*xhat (nhat)  )/aal 
do  i=l,nhat-2 

xhat(i)=xlhat(i)  +  xhat (nhat-l)*x2hat (i)  +  xhat (nhat)*x3hat(i) 
end  do 

return 

end 
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